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Abstract — In this work we consider the communication of 
information in the presence of a causal adversarial jammer. 
In the setting under study, a sender wishes to communicate 
a message to a receiver by transmitting a codeword x = 
{xi, . . . ,x„) bit-by-bit over a communication channel. The 
adversarial jammer can view the transmitted bits Xi one at 
a time, and can change up to a p-fraction of them. However, 
the decisions of the jammer must be made in an online or 
causal manner. Namely, for each bit Xi the jammer's decision 
on whether to corrupt it or not (and on how to change it) must 
depend only on Xj for j < i. This is in contrast to the "classical" 
adversarial jammer which may base its decisions on its complete 
knowledge of x. We present a non-trivial upper bound on the 
amount of information that can be communicated. We show 
that the achievable rate can be asymptotically no greater than 
min{l — H{p),{l — 4p)+}. Here H{.) is the binary entropy 
function, and (1 — 4p)+ equals 1 — 4p for p < 0.25, and 
otherwise. 

I. Introduction 

Consider the following adversarial communication sce- 
nario. A sender Alice wishes to transmit a message u to a 
receiver Bob. To do so, Alice encodes u into a codeword 
X and transmits it over a binary channel. The codeword 
X — ^1, ... 1 IS a binary vector of length n. However, 
Calvin, a malicious adversary, can observe x and corrupt up 
to a p-fraction of the n transmitted bits, i.e., pn bits. 

In the classical adversarial channel model, e.g., [4], it is 
usually assumed that Calvin has full knowledge of the entire 
codeword x, and based on this knowledge (together with the 
knowledge of the code shared by Alice and Bob) Calvin can 
maliciously plan what error to impose on x. We refer to 
such an adversary as an omniscient adversary. For binary 
channels, the optimal rate of communication in the presence 
of an omniscient adversary has been an open problem in 
classical coding theory for several decades. The best known 
lower bound is given by the Gilbert- Varshamov bound [10], 
[18], which implies that Alice can transmit at rate 1 — H{2p) 
to Bob. Conversely, the tightest upper bound was given by 
McEliece et al. [12], and has a positive gap from the lower 
bound for all p e (0, 1/4) (see Fig. [Hi. 

In this work we initiate the analysis of coding schemes 
that allow communication against certain adversaries that are 
weaker than the omniscient adversary. We consider adver- 
saries that behave in a causal or online manner Namely, for 
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Fig. 1. Bounds on capacity of the adversarial channel. The bold hne in 
purple is our upper bound of min{l — H{p), (1 — 4p)+}. 

each bit Xi, we assume that Calvin decides whether to change 
it or not (and if so, how to change it) based on the bits xj, 
for j < i alone, i.e., the bits that he has already observed. In 
this case we refer to Calvin as a causal adversary. 

Causal adversaries arise naturally in practical settings, 
where adversaries typically have no a priori knowledge of 
Alice's message u. In such cases they must simultaneously 
learn u based on Alice's transmissions, and jam the corre- 
sponding codeword x accordingly. This causality assumption 
is reasonable for many communication channels, both wired 
and wireless, where Calvin is not co-located with Alice. For 
example consider the scenario in which the transmission of 
x = xi, . . . ,Xn is done during n channel uses over time, 
where at time i the bit Xi is transmitted over the channel. 
Calvin can only corrupt a bit when it is transmitted (and 
thus its error is based on its view so far). To decode the 
transmitted message. Bob waits until all the bits have arrived. 
As in the omniscient model, Calvin is restricted in the number 
of bits pn he can corrupt. This might be because of limited 
processing power or limited transmit energy. 

Recently, the problem of codes against causal adversaries 
was considered and solved by the authors [6] for large- 
q channels, i.e., channels where Alice's codeword x = 
xi, . . . ,Xn is considered to be a vector of length n over a 
field of "large" size q. Each symbol Xi may represent a large 
packet of bits in practice. Calvin is allowed to arbitrarily 



corrupt a p-fraction of the symbols, rather than bits. A tight 
characterization of the rate-region for various scenarios is 
given in [6], and computationally efficient codes that achieve 
these rate-regions are presented. However, the techniques 
used in characterizing the rate-region of causal adversaries 
over large-fjf channels do not work over binary channels. 
This is because each symbol in a laige-q channel can contain 
within it a "small" hash that can be used to verify the symbol. 
This is the crux of the technique used to achieve the lower 
bounds in [6]. We currently do not know how to extend this 
method to binary channels. Conversely, for upper bounds, 
the geometry of the space of length-n codewords over large- 
q alphabets is significantly different than that corresponding 
to binary alphabets. For instance, for large-g channels the 
volume of an 71-sphere of radius an (0 < a < 1) over Fq is 
~ g"". This leads to simpler bounds for Isnge-q channels. 

In this work we initiate the study of binary causal- 
adversary channels, and present two upper bounds on their 
capacity: 1 — H{p), and (1 — 4^)+. The upper bound of 
1 — H{p) is very "natural". Namely, it is not hard to verify 
that if Calvin attacks Alice's transmission by simulating the 
well-studied Binary Symmetric Channel [4], he can force a 
communication rate of no more than 1 — H{p). The upper 
bound of (1 — 4]?)+ presented in this work is non-trivial for 
both its implications and its proof techniques. The bound 
demonstrates that at least for some values of p, the achievable 
rate is bounded away from 1 — H{p). For p E {po,Q.5), 
1 — 4p is strictly less than 1 — H{p) (here p^ is the value 
of p satisfying H{p) ~ Ap, and can be computed to be 
approximately 0.15642...). In fact for p G (0.25,0.5) our 
bound implies that no communication at positive rate is 
possible, which is much stronger than the result obtained 
by the upper bound of 1 — H{p) (see Fig. [T]). Our proof 
techniques include a combination of tools from the fields 
of Extremal Combinatorics (e.g. Turan's theorem [17]), and 
classical Coding Theory (e.g. the Plotkin bound [14], [2]). 

II. Model 

For any integer i let [i] denote the set {1, . . . , i}. Let i? > 
be Alice's rate. An (n, Rn)-code C is defined by Alice's 
encoder and Bob's corresponding decoder, as below. 
Alice: Alice's message u is assumed to be a random variable 
U with entropy Rn, over alphabet U. We consider two types 
of encoding schemes for Alice. 

For deterministic codes, Alice's message U is assumed to 
be uniformly distributed over U = [2^"]. Her deterministic 
encoder is a deterministic function that maps every 

u in [2'''"] to a vector x(u) = {xi, . . . ,Xn) in {0,1}". 
Alice's codebook X is the collection {x(u)} of all possible 
transmitted codewords. 

More generally, Alice and Bob may use probabilistic 
codes. For such codes, the random variable U corresponding 
to Alice's message pu may have an arbitrary distribution 
Pu (with entropy Rn) over an arbitrary alphabet U. Alice's 
codebook X is an arbitrary collection {X{u)} of subsets of 
{0, 1}". For each subset X{u) C X, there is a corresponding 



codeword random variable X(it) with codeword distribution 
Px(u) over X{u). For any value U = w of the message, 
Alice's encoder choses a codeword from X{u) randomly 
from the distribution px(u)- Alice's message distribution pu, 
codebook X, and all the codebook distributions px{u) are all 
known to both Bob and Calvin, but the values of the random 
variables U and X(.) are unknown to them. If X{u) — 
{x(w, ?') : r G Au}, then the transmitted codeword X(U) has 
the probability distribution given by Pr[X(U) = x(m, r)] = 
P[/(it)px(M) r)). Let p be the overall distribution of 
codewords x = x(u,r) of Alice. It holds that p(x(w, r)) = 
Pu{u)px(u){y^) andp(x) = Ec/ Pc/("W(u) (x). 
Calvin/Channel: Calvin possesses n jamming functions gi{.) 
and n arbitrary jamming random variables that satisfy the 
following constraints. 

Causality constraint: For each i G [n], the jamming function 
gi{.) maps x' ~ {xi, . . . ,Xi) and J* ~ (Ji, . . . , J.^) to an 
element of {0, 1}. 

Power constraint: The number of indices i G [n] for which 
the value of gi{.) equals 1 is at most pn. That is, for all 
x",J", E.5.(x\J') <pn. 

The output of the channel is the set of bits yi = Xi ®gi (x* , J*) 
for i = 1, . . . , n. 

Bob: Bob's decoder is a (potentially) probabilistic function 
h{.) of the received vector y. It maps the vectors y = 
{yi, . . .yn) in {0, 1}" to the messages in U. 
Code parameters: Bob is said to make a decoding error 
if the message u' he decodes differs from the message 
u encoded by Alice. The probability of error for a given 
message u is defined as the probability, over Alice, Calvin 
and Bob's random variables, that Bob makes a decoding 
error The probability of error of the code C is defined as 
the average over all u G of the probability of error for 
message u. 

We define two types of rates and corresponding capacities. 

The rate R is said to be weakly achievable if for every 
e > 0, 6 > and every sufficiently large n there exists an 
(n, (R — (5)n)-code that allows communication with proba- 
bility of error at most e. The supremum over n of the weakly 
achievable rates is called the weak capacity and is denoted 
by C". 

The rate R is said to be strongly achievabl^ if for every 
(5 > 0, 3a > so that for sufficiently large n there 
exists an {n, {R~ 5)n)-code that allows communication with 
probability of error at most e~"". The supremum over n of 
the strongly achievable rates is called the strong capacity and 
is denoted by C^. 

Remark: Since a rate that is strongly achievable is always 
weakly achievable but the converse is not true in general, 

C" > C". 

' This definition is motivated by ttie extensive literature on eiTor exponents 
in information theory - for large classes of information-theoretic problems, 
e.g. [9], [5], the probability of en'or of the coding scheme is required to 
decay exponentially in block length. 



III. Related work and our results 

To the best of our knowledge, communication in the 
presence of a causal adversary has not been explicitly ad- 
dressed in the literature (other than our prior work for causal 
adversaries over large-q channels). Nevertheless, we note that 
the model of causal channels, being a natural one, has been 
"on the table" for several decades and the analysis of the 
online/causal channel model appears as an open question in 
the book of Csiszar and Korner [5] (in the section addressing 
Arbitrary Varying Channels [1]). Various variants of causal 
adversaries have been addressed in the past, for instance [1], 
[11], [15], [16], [13] - however the models considered therein 
differ significantly from ours. 

At a high level, we show that for causal adversaries, for a 
large range of p (for all p > 0.25), the maximum achievable 
rate equals that of the classical "omniscient" adversarial 
model (i.e., 0). This may at first come as a surprise, as 
the online adversary is weaker than the omniscient one, 
and hence one may suspect that it allows a higher rate of 
communication. 

We have two main results. Theorem [T] gives an upper 
bound on the weak capacity C" if Alice's encoder is de- 
terministic. Theorem |2] gives an upper bound on the strong 
capacity in the more general case where Alice's encoder 
is probabilistic. Due to certain limitations of our proof 
techniques, we do not present any bounds on the weak 
capacity in the latter setting. The upper bound in both cases 
equals miii{l — H{p), (1 — 4p)+}. 

Theorem 1 (Deterministic encoder): For deterministic 
codes, < C" < min{l - H{p), (1 - 4p)+}. 

Theorem 2 (Probabilistic encoder): For probabilistic 
codes, < min{l - H[p), (1 - 4p) + }. 

We note that under a very weak notion of capacity in which 
one only requires the success probability to be bounded 
away from zero (instead of approaching 1), the capacity of 
the omniscient channel, and thus the binary causal-adversary 
channel, approaches 1 ~ H(p). This follows by the fact that 
for n sufficiently large and £> A there exists (n, Rn) codes 
which are {£,pn) list decodable with i? = 1 — H{p){l + 
1/^) [7]. Communicating using an {£,pn) list decodable 
code allows Bob to decode a list of size £ of messages 
which includes the message transmitted by Alice. Choosing 
a message uniformly at random from his list. Bob decodes 
correctly with probability at least l/£. 

A. Outline of proof techniques 

The upper bound of 1—H{p) follows directly by describing 
an attack for Calvin wherein he approximately simulates 
a BSC(p) (Binary Symmetric Channel [4] with crossover 
probability p). More precisely, for each i G [n] and any 
sufficiently small £ > 0, Calvin flips Xi with probability 
p — £ until he runs out of his budget of pn bit-flips. By 
the Chernoff bound [3], with very high probability he does 
not run out of his budget, and is therefore indistinguishable 
from a BSC(p — e). But it is well-known [4] that in this 
case the optimal rate of communication from Alice to Bob 



is 1 — H{p — e). Taking the limit when e implies our 
bound. 

The upper bound of (1 — 4p)+ is more involved. For the 
case where Alice's encoder is deterministic, the proof of 
Theorem [T] has the following overall structure. Assume for 
sake of contradiction that Alice attempts to communicate at 
rate greater than R— [1 — 4p)+. To prove our upper bound 
we design the following wait-and-push attack for Calvin. 

Calvin starts by waiting for Alice to transmit approxi- 
mately Rn bits. As Alice is assumed to communicate at rate 
greater than R, the set of Alice's codewords X' consistent 
with the bits Calvin has seen so far is "large" with "high 
probability". Calvin constructs X' and chooses a codeword 
x' uniformly at random from X' . He then actively "pushes" 
X in the direction of x' by flipping, with probability 1/2, 
each future Xi that differs from x'^. If Calvin succeeds in 
pushing X to a word y roughly midway between x and 
x', a careful analysis demonstrates that regardless of Bob's 
decoding strategy. Bob is unable to determine whether Alice 
transmitted x or x' — causing a decoding error of 1/2 
in this case. So, to prove our bound, we must show that 
with constant probability (independent of the block length 
n) Calvin will indeed succeed in pushing x to y. Namely, 
that Alice's codeword x and the codeword chosen at random 
by Calvin x' are of distance at most 2pn. Roughly speaking, 
we prove the above by a detailed analysis of the distance 
structure of the set of codewords in any code using tools 
from extremal combinatorics and coding theory. 

The case where Alice's encoder may be randomized is 
more technically challenging, and is considered in Theo- 
rem 12] At a high level, the strategy of Calvin for a prob- 
abilistic encoder follows that outlined for the deterministic 
case. However, there are two main difficulties in its ex- 
tended analysis. Firstly, the symmetry between x and x' 
no longer exists. Namely, the fact that Bob may not be 
able to distinguish which of the two were transmitted by 
Alice does not necessarily cause a significant decoding error, 
since the probability of x' being transmitted by Alice may 
well be significantly smaller than the probability that x 
was transmitted. Secondly, the fact that both x and x' may 
correspond to the same message u places the entire scheme 
in jeopardy. As it now no longer matters if Bob decodes to 
X or x', in both cases the decoded message will be that sent 
by Alice. 

To overcome these difficulties, we describe a more intricate 
analysis of Calvin's attack. Roughly speaking, we prove that 
a "large" subset X" of X' behaves "well". Any x' chosen 
uniformly at random from X' , with "significant" probability, 
is in X" , and has three properties corresponding to those 
when Alice uses a deterministic encoder That is, x' is 
sufficiently close to x as desired, it has approximately the 
same probability of transmission that x does (thus preserving 
the needed symmetry), and it also corresponds to a message 
that differs from that corresponding to x. All in all, we 
show that the above three properties hold with probability 
l/poly(ri), which suffices to bound the strong capacity of 



the channel at hand (but not the weak capacity). 

In case of a randomized encoder of Alice, we assume that 
the messages may have nonuniform distribution, and also any 
message is encoded into one of a set of possible codewords as 
per some probability distribution in that set. One may think 
of various other ways of encoding, for example the following, 
to confuse Calvin. But as we discuss in the next paragraph, 
such schemes are also covered in our setup. 

Multiple codebooks: In this scheme, Alice maintains a 
set of codes Ci,C2, . . . ,Cl. For transmitting a message u, 
she randomly selects the code d with probability qi. If the 
set of messages is W = {l,2,...,Af} with a probability 
distribution given by pi = Pr{u = z}, and the code C,- 
contains the codewords {x(w, ?■) | u — l,2,...,Af}, then 
in our setup, the corresponding codebook for the message u 
will be X{u) = {x(it, r) | r = 1, 2, . . . , L}. This codebook 
may have less than L codewords due to common codewords 
in the original codes. The induced probability distribution in 
this codebook of u is given by Pr{x\u} = I]r:x(«.r)=x 

If Alice picks a code and uses it to encode several 
messages, even then she does not gain anything. First, if 
she uses the same code to encode too many messages (and 
calvin knows the encoding scheme, as assumed), then both 
Bob and Calvin will know the code used after receiving or 
'reading' some codewords. On the other hand, if a randomly 
chosen code is used only to encode a block of few messages 
this is equivalent to using a longer ('superblock') code in 
our setup. The only difference is that the probability of error 
analysed in our set up is the probability of error in decoding 
the 'superblocks' rather than the smaller blocks/codewords. 

The proofs of the upper bounds corresponding to \ — H{p) 
have already been sketched in Section UlI-AI Hence we only 
provide proofs of the upper bounds corresponding to (1 — 
4p)+ in Theorems [T] and |2] 

IV. Proof of Theorem[T] 

Let i? = (1 — 4p)+ + e for some e > 0. Let log(.) denote 
the binary logarithm, here and throughout. By assumption for 
deterministic codes, Alice's message space U is of size 2^". 
Here we assume for that 2^" in an integer This implies that 
the set X of Alice's transmitted codewords is of size 2^".[1 

We now present Calvin's attack. We show that for any 
fixed e > 0, regardless of Bob's decoding strategy, there 
is a decoding error with constant probability (namely, the 
error probability is independent of n). Calvin's attack is in 
two stages. First Calvin passively waits until Alice transmits 
£ = (R - s/2)n bits over the channel. Let x'^ e {0, l}*" be 
the value of the codeword observed so far. He then considers 
the set of codewords that are consistent with the observed x^. 
Namely, Calvin constructs the set X\-^e = {x = xi, . . . , a;„ S 

^In fact, X may be smaller, however we note that for codes of optimal 
rate, \X\ is of size exactly 2^". If \X\ < 2^", then for some transmitted 
codeword x at least two messages u and u' must both be encoded to x. 
On receiving x, Bob's probability of error is maximal - it is at least 1/2. 
Therefore changing the codebook so as to encode u' as some x' ^ X cannot 
increase the probability of decoding error. 



X \ xi, . . . ,Xi ~ x^}. He then chooses an element x' G X\.f,e 
uniformly at random. In the second stage, Calvin follows a 
random bit-flip strategy. That is, for each remaining bit x'^ of 
x' that differs from the corresponding bit Xi of x transmitted, 
he flips the transmitted bit with probability 1/2, until he has 
either flipped pn bits, or until i ~ n. 

We analyze Calvin's attack by a series of claims. We first 
show that with high probability (w.h.p.) the set X\^i is large. 

Claim 4.1: With probability at least 1 - 2~^"/'*, the set 
X\^e is of size at least 2'"/''. 

Proof: The number of messages u for which A:'|x'!(ti) 
is of size less than 2^"'/^ is at most the number of distinct 
prefixes x^ times 2^^"/^, which in turn is at most 2^+"^"/^ = 

2(/?-e/4)n^ g 

Now assume that the message u is such that its correspond- 
ing set is of size at least 2"^"/^. We now show that this 
implies that the transmitted codeword x and the codeword x' 
chosen by Calvin are distinct and of small Hamming distance 
apart with a positive probability (independent of n). 

Claim 4.2: Conditioned on Claim 14.11 with probability at 
least g|^, X ^ x' and rf^f(x, x') < 2pn — eri/S. 

Proof: Consider the undirected graph Q ~ (V, £) in 
which the vertex set V consists of the set X\-^e and two 
nodes are connected by an edge if their Hamming distance 
is less than d = 2pn — en/ 8. An independent set X in Q 
corresponds to a subset of codewords in {0, 1}" that are all 
(pairwise) at distance greater than d. 

Since the codewords in X\^t all have the same prefix x^, 
one may consider only the suffix (of length n — £ ~ Apn — 
en/2) of the codewords in X\-^e. Here we assume p < 0.25, 
minor modifications in the proof are needed for larger p. The 
set of vectors defined by the suffixes in an independent set 
2 of Q now corresponds to a binary error-correcting code 
of length Apn — en/ 2, with \T\ codewords and minimum 
distance d. 

By Plotkin's bound [2] there do not exist binary error cor- 
recting codes with more than „ , ^'^ jryr + 1 codewords. 

2a— (4pn — en/2) 

Thus X, any maximal independent set in Q, must satisfy 



\I\ < 



2{2pn - en/8) 



16p 

e 



(1) 



2(2pn - en/8) - Apn + en/2 

By Turan's theorem [17], any undirected graph Q of size 
|V| and average degree A has an independent set of size 
at least |V|/(A + 1). This, along with ([B implies that the 
average degree of our graph Q satisfies 



IVI 



A + 1 

This in turn implies that 



< \I\ < 



16p 



r|V| 



A>^-1> 
- I6p - 32p 

The second inequality is for large enough n, since |V| is of 

size at least 2^". To summarize the above discussion, we 

have shown that our graph G has large average degree of 

size A > We now use this fact to analyze Calvin's 

attack. 



By the definition of deterministic codes, any codeword in 
X is transmitted with equal probabiHty. Also, by definition 
both X (the transmitted codeword) and x' (the codeword 
chosen by Calvin) are in V = /^jx'- Hence both x and x' are 
uniform in X\-x_i- This implies that with probability |iS|/|Vp 
the nodes corresponding to codewords x and x' are distinct 
and connected by an edge in Q. This in turn implies that with 
probability |f |/|Vp, x 7^ x' and dni^^x') < 2pn - en/8, 
as required. Now 

|g| _ A|V| ^ e 
|V|2 2|Vp ~ 64p 

■ 

Conditioned on Claim 14.21 Calvin's codeword x' is 
very close to Alice's transmitted codeword x. Specifically, 
d/f(x,x') G {Q,2pn — en/8). We now show that if Calvin 
follows the random bit-flip strategy, from Bob's perspective 
(w.h.p.), both X or x' were equally likely to have been 
transmitted by Alice. 

We first show that during Calvin's random bit-flip process, 
w.h.p., Calvin does not "run out" of his budget of pn bit flips. 

Claim 4.3: Conditioned on Claim |42] with probability at 
least 1 - 2-"(^'") 

, , . f d en d en\ 
dHU,y) e , - H . (2) 

\2 16' 2 ley 

Proof: The expected number of locations flipped by 
Calvin is d/2 < pn~ en/16. Assume that d/2 = pn — en/16 
(for smaller values of d the bound is only tighter). By Sanov's 
theorem [4, Theorem 12.4.1], the probability that the number 
of bits flipped by Calvin deviates from the expectation d/2 
by more than en/16 is at most e^^^^^'" Z'^) < e^^^'^ for 
large enough n. ■ 
It should be noted that d/2 + eri/W < pn, and so 
dH{x,y) < d/2 + £77-/16 implies that the number of bits 
flipped by Calvin does not exceed pn. Since Calvin possibly 
flips only the bits of x which differ from the corresponding 
bits in x', (|2]i also implies 

, , I . f d en d en\ 

rfff(x ,y) e , - H . (3) 

^ '•^^ \2 16' 2 16/ 

We conclude by proving that if the number of bits flipped 
by Calvin lies in the range ((i/2 — en/16, d/2 + £n/16), then 
indeed Bob cannot distinguish between the case in which x 
or x' were transmitted. 

Claim 4.4: Conditioned on Claim 14.31 Bob makes a de- 
coding error with probability at least 1/2. 

Proof: By Bayes' Theorem [8], if Bob receives y, 
the a posteri probability that Alice transmitted x, denoted 
p(x|y), equals p(y|x)p(x)/p(y). Herep(x) is the probability 
(over her encoding strategy) that Ahce transmits x, p(y|x) 
is the probability (over Calvin's random bit-flipping strategy) 
that Bob receives y given that Alice transmits x, and p{y) 
is the resulting probability that Bob receives y. Similarly, 



p(x'|y) = p(y|x')p(x')/p(y). Taking the ratio and noting 
that for deterministic codes p(x) = p(x'), we have 

p(x|y)/p(x'|y) =p(y|x)/p(y|x')- (4) 

Since Calvin's random bit-flip strategy involves him flip- 
ping bits of X (which are different from the corresponding 
bits of x') with probability 1/2, for all y satisfying (|2]i, the 
probabilities p(y|x) and p(y|x') are equal. This observation 
and (HJi together imply p(x|y) = p(x'|y). Thus, Bob cannot 
distinguish whether x or x' were transmitted. Namely, on the 
pair of events in which Alice transmits x and Calvin chooses 
x' and in which Alice transmits x' and Calvin chooses x, no 
matter which decoding process Bob uses, he will have an 
average decoding error of at least 1/2. This suffices to prove 
our assertion. ■ 

Thus a decoding error happens if the con- 
ditions of Claims O 1121 and US are 
all satisfied. This happens with probability at 
least (1-2— /4) (i_2-"(^M)(1) > 

(5) (efe) (5) (5) > 5^ for large enough n. 

■ 

V. Proof of Theorem|2] 

We start by proving the following technical Lemma that we 
use in our proof. Let q be an arbitrary probability distribution 
over an index set / = {l,...,fc}. Let Ai,...,Ak be 
arbitrary discrete random variables with probability distribu- 
tions qi, . . . ,qk over alphabets Ai, . . . , Ak respectively. Let 
ki = \Ai\. Let A be a random variable that equals the random 
variable Aj with probability q{i). Then the following Lemma 
describing an elementary property of the entropy function 
H{.) is useful in the proof of Theorem [2] 

Lemma 5.1: The entropies of A, Ai, . . . , Ak and q satisfy 
■ff(A) < ELi (lii)H{Ai)+H{q), with equality if and only 
if for each for which both q{i) and q{i') are positive it 
holds that Pr^^^g., [Ai = Ai^] = 0. 

Proof: For any a G A, the probability Pr{A = a} = 
p{a) of occurrence of a, equals J^i aeA Hence 

H{A) ^ - J2 P(«)log(p(a)) 

ae\j,Ai 

k ki 

< -5]5^g(*)g.(.7)log((7(*)g.(j)) (5) 

1=1 j=i 

k ki 

1=1 j=i 

k ki 

i=i j=i 

k 

= J2qm{A,)+H{q). 

i=l 

Here (|5]l follows from Jensen's inequality, e.g. [4], with 
equality if and only if for each positive Pr{A = a}, there is 
a unique i such that q{i)qi{j) > (here ai{j) = a). ■ 



We now turn to prove Theorem |2l Recall our notation: let 
U be the random variable corresponding to Alice's message 
and pu its distribution (with entropy Rn). Throughout we 
assume the message set U (the support of U) is at most 
of size 2". Let X be Alice's codebook. A" is a collection 
{X{u)} of subsets of {0, 1}". For each subset X{u) C X, 
there is a corresponding codeword random variable X(w) 
with codeword distribution px(u) over X{u). For any value 
U = u of the message, Alice's encoder choses a codeword 
from X{u) randomly from the distribution px[u)- Alice's 
message distribution pu, codebook X, and all the codebook 
distributions px[u) are all known to both Bob and Calvin, 
but the values of the random variables U and X(.) are 
unknown to them. If X{u) = {x(u,r) : r £ A„}, then the 
transmitted codeword X(U) has the probability distribution 
given by Pr[X(U) = x{u,r)] = pu{u)px(u){y^{u,r)). Let 
p the the overall distribution of codewords x = x(u, r) 
of Alice. It holds that p(x(it, r)) — pu{u)px(u}{^) ™d 

For any e > 0, let i? = (1 - 4p)+ + s. We start 
by specifying Calvin's attack. Calvin uses a very similar 
attack to the one described in the proof of Theorem [T] 
That is, Calvin first passively waits until Alice transmits 
e = (R - e/2)n bits over the channel. Let x.^ G {0,1}'^ 
be the value of the codeword observed so far He then 
considers the set of codewords x(u, r) consistent with the 
observed x^. Here and throughout this section, we denote 
codewords by their corresponding message u and index r 
in X{u). As it may be that x{u,r) is exactly the same 
codeword as x(u',r'), the sets in the definitions to follow 
and in this section are in a sense multisets. Namely, Calvin 
constructs the set X\-^e = {x{u,r) = xi,...,Xn G X \ 
xi, ■ ■ ■ ,Xi = x^}. Let p(x^) = p(<-f 1x0 be the probability, 
under the probability distribution p, corresponding to the 
event that Calvin observes x^ in the first £ transmissions. Let 
J and Px{u)\ e be the probability distributions pu and 
Px{u) also respectively conditioned on the same event. Calvin 
then chooses an element x'(u',r') £ X\.^i with probabilitjO 
Pu\ f (■"')-Px(u')l J! (^'("'' ^'))- 111 the second stage he then 
follows exactly the same random bit-flip strategy as in the 
proof of Theorem [T] 

Recall that in the proof of Theorem [T] our goal was 
to prove that with some constant probability, the distance 
between x(w, r) and x'(u', r') is approximately 2pn. Loosely 
speaking, this allows the success of Calvin's attack (i.e., 
imply a decoding error). Following the same outline of 
proof, we now show that with probability l/poly(n,) the 
codeword x'(u', r') chosen by Calvin has the following three 
properties: 

« It's corresponding message differs from that correspond- 
ing to x(u, r) (i.e., u ^ u'). 
m x'(u', r') is close to x(u, r) and thus Calvin will be able 

'This is one significant difference from the attack in the proof of 
Theorem [T] - there Calvin chooses each x' uniformly at random from the 
corresponding consistent set. 



to "push" x(u, r) to a codeword y at approximately the 
same distance from x(u, r) and x'(u', r'). 

• Given y. Bob is unable to distinguish whether x(u, r) 
or x'(u',r') was transmitted. 

To this end, we partition the set X\^e into disjoint subsets 
Xij for i,j € {1, 2, ... , n}. Let p{Xij) be the probability 
mass of Xij. Let Pu\ij and Px{u)\ij be the probability 
distributions pu and px{u) respectively conditioned on the 
event that Alice transmitted x(m, r) in Xij. The partition 
Xij is obtained in two steps - first we partition Xl-^^e into n 
subsets Xi, then we partition each Xi into n sets Xij. We 
also use the probability distribution p{Xi), pu\. and Px(u)\i 
defined accordingly. All in all, we prove the existence of a 
subset Xij with the following properties 

• H{pu\^^) is "large". 

• p{Xij) is large with respect to p{x^). 

> For any x(u,r) G Xij it holds that p(x(u,r)) has 
approximately the same value. 

• Pu\ij is approximately uniform on its support. 
Roughly speaking, proving these properties on Xij reduces 
us to the case of a deterministic encoder (addressed in 
Theorem [U and allows us to complete our proof. 

We now present our proof for the existence of Xij as 
specified above. We first show that with positive probability 
the set X\y^i has high entropy. 

Claim 5.1: With probability at least e/4, H{pu\ ^) > 
en/4. 

Proof: Let q be the probability distribution over {0, 1}^ 
for which g(x^) = p(x^) for all possible x^ G {0, 1}^. 
Let q^i be the probability distribution pjj\ ^ . Now using 
Lemma ISTI we obtain 

H{pu) < qi^')H{pu\^, ) + Hiq). (6) 

By our definitions H{pu) = Rn. Moreover, H{q) < £ = 
{R — e/2)n (since q is defined over an alphabet of size 2^). 
Thus ^ becomes 

q{x^)H{pu\^, )>Rn~{R- e/2)n = en/2. 

x' 

As the average of H{pij\ ^ ) is at least en/2, then Hijjjj^ ^ ) > 
en/4 with probability at least e/4 (by a Markov type inequal- 
ity, here we use the fact that H{pu\ ^) < n). ■ 

We now define the sets Xi. For i = 1, . . . , n — 1, let 
be the set of codewords in X\y^t for which p{x{u, r))/p(x^) 
is in the range (2^^*, 2~^*+^]. The set Xn is defined to be 
the set of codewords in X\y.i for which p(x(m, r))/p(x^) 
is in the range [0, 2^'^"+^]. Let p{Xi) be the probability 
mass of Xi. Namely p{Xi) ~ 2^^*! A'i|p(x^). Let q be the 
distribution over {l,2,...,n} taking i w.p. p{X i) / pix^) . 
Notice that H{q) < log(n) — o(n) (as its support is of size 
n). Conditioning on Claim ISTI and using Lemma ISTI it can 
be verified that 

Claim 5.2: 

Y <li})H{puu) > H{pu\^^ )~Hiq)> en/8 (7) 



Consider sets Xi with (relative) mass q{i) > 1/n^.lt holds 
that 

q{t) H (puu) > en/ 16 

i<n-l:g(i)>l/n2 

The above follows from the fact that 
Ej<„-i;g(j)<i/„2 ^(^^(Pc/I J + q{n)H{puu) < 
E»<n-i;9(0<i/n2 n/n2 + 2-"+3n < 2 (for sufficiently large 
n). Here we use the fact that q{n) < |A'j|2~3"+'l 

We conclude the existence of a set Xi such that q{i) > 
1/n^ and H{pjj\.) > en/16. We now further partition Xi. 
For j = 1, . . . , n — 1, let Xij be the set of codewords x(u, r) 
in A".; for which is in the range (2~'^-' , 2~'^-'+'^]. Xin 

is defined to be the set of codewords x(it, r) in A'^ for 
which is in the range [0, 2^^"+'^]. Let p{Xij) be the 

probability mass of Xij. Namely p{Xij) ~ 2~^*| A'y |p(x^). 
Let q' be the distribution over {l,2,...,n} taking j w.p. 
p{Xij)/p{Xi). Notice that < log(n) = o{n) (as its 

support is of size n). As before, conditioning on Claim 15.21 
and using Lemma 15.11 it can be verified that (for the index i 
specified above), 

Claim 5.3: 

> Hipuu) - H{q') > en/32 (8) 

j 

Again, consider sets Xij with mass q'{i) > l/n^. It holds 
that 

q'{j)H{puu,)> en/64 

j<n-l;q'{j)>l/n^ 

We conclude the existence of a set Xij such that 

• H{pjj\..) > en/64. 
. piX,jj'>p{^')/n\ 

m For any x(it, r) e Xij it holds that p(x(it,r)) is 
approximately 2^'^'p(x^). 

• For any x(m, r) G Xij it holds that (w) is approx- 
imately equal. 

The set Xij is exactly what we are looking for. Roughly 
speaking, by Claim ISTl with probability at least e/4 Calvin 
views a prefix x^ for which H{pij\ ^) > en/4. Conditioning 
on this event, both Alice and Calvin choose codewords 
x(it, r), ^xOu' ,r') in Xij with probability at least I/71®. 

We now sketch to remainder of the proof which closely 
follows that of Theorem [T] We partition Xij into groups 
of messages Xij{u) consisting of all codewords in Xij 
corresponding to u. Recall that each codeword x(w, r) G Xij 
has approximately the same probability p(x(M,r)), and for 
each x(w, r) G Xij it holds that pij\..{u) is approximately 
the same value. This implies that each group Xij{u) C Xij 
has approximately the same size. Moreover, as H{pu\..) > 
en/ 64 it holds that there are at least 2^"/^"' non-empty 
subsets Xij{u) in Xij. 

So, all in all, Xij has a very symmetric structure: it 
includes many groups, each consisting of elements with the 
same transmission probability, and each of approximately 
the same size and mass (w.r.t. p). This reduces us to the 
case considered in Theorem [T] in which our subset X\-^e 



included many messages, each with the same probability, 
details follow. 

Consider the graph Q = (V, £) in which the vertex set V 
consists of the set Xij and two nodes are connected by an 
edge if their Hamming distance is less than d = 2pn — en/8. 

Now, it is can be verified (using analysis almost identical 
to that given in the proof of Theorem [TJ that 

1) With probability at least 1 - 2~'^(^") the codewords 
x(it, r) and x'(it',r') satisfy u 7^ u'. Here one needs 
to take into consideration the slight difference in the 
group sizes and the probabilities for each codeword. 

2) With probabiUty ^ f th^ vertices in Q corresponding 
to x(it,r) and x'(m , r') are connected by an edge. 

3) During Calvin's random bit-flip process, with high 
probability of 1 — 2^^^'^ "\ Calvin does not "run out" 
of his budget of pn bit flips. 

4) Conditioning on the above. Bob cannot distinguish 
between the case in which x(w, r) or x'(w',r') were 
transmitted. 

5) Finally, on the pair of events in which Alice trans- 
mits x(w, r) and Calvin chooses x'(w',r'), and Alice 
transmits x'(w',r') and Calvin chooses x(u,r), no 
matter which decoding process Bob uses, he has an 
average decoding error that is bounded away from zero. 
Here again we take into account the slight differences 
between p(x(u, r)) and p(x'(u', r')). 

To summarize, Calvin causes a decoding error with prob- 
ability (poly (e) /poly (ri)) = f7(l/poly(ri)) as desired. This 
concludes our proof. ■ 

VI. Conclusions 

We analyze the capacity of the causal-adversarial channel 
and show (for both deterministic and probabilistic encoders) 
that the capacity is bounded by above by min{l — H{p),{l — 
4p)+}. For a large range of p (for all p > 0.25), the 
maximum achievable rate equals that of the stronger classical 
"omniscient" adversarial model (i.e., 0). 

Several questions remain open. In this work we do not 
address achievability results (i.e., the construction of codes). 
It would be very interesting to obtain codes for the causal- 
adversary channel which obtain rate greater than that know 
for the "omniscient" adversarial model {i.e., the Gilbert- 
Varshamov bound) for p < 0.25). As we do not beUeve 
that the upper bound of (1 — 4p)+ presented in this work 
is actually tight, such codes, if they exist, may give a hint to 
the correct capacity. 

As done in our work on large alphabets [6], one may 
also consider the more general channel model in which for 
a delay parameter d G (0, 1), the jammer's decision on 
the corruption of Xi must depend solely on Xj for j < 
i — dn. This might correspond to the scenario in which the 
error transmission of the adversarial jammer is delayed due 
to certain computational tasks that the adversary needs to 
perform. The capacity of the causal channel with delay is an 
intriguing problem left open in this work. 
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